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Abstract 



A double stranded DNA molecule under the stress of a pulling force acting 
on the strand terminals exhibits a partially denatured structure or can be 
completely unzipped depending the magnitude of the pulling force. A scaling 
argument for relationships amongst basic length scales is presented that takes 
into account the heterogeneity of the sequence. The result agrees with our 
numerical simulation data, which provides a critical test of the power laws in 
the unzipping transition region. 
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The characteristics of the structure of double stranded DNA (dsDNA) have been recently 
probed through mechanical manipulation at a microscopic level by applying an external 
force. From a theoretical modelling perspective, we have just begun to understand these 
experimental results, which are usually provided in the form of manipulating force verses 
extension. Most of the current theoretical work is based a homopolymer model, in which 
the binding forces between nucleotides in various types are treated equally . One of the 
theoretical efforts is to reproduce the force-extension phase diagram observed experimentally, 
together with other interesting conformational properties that characterize the unzipping 
transition. It is certainly a challenging task to understand the system-dependent and the 
universal features in these complex systems, as the pairing along the two DNA strands is in 
general heterogeneous rather than homogeneous 0. 

The present article serves two purposes. Firstly, a physically direct scaling argument is 
employed to analyze a simplified heterogeneous dsDNA model, which exhibits an unzipping 
transition under the external pulling force acting upon the terminal end. The usefulness 
of some well-tested scaling arguments in polymer theory is demonstrated, both in terms 
of a Gaussian model and of a freely jointed bond model that correctly addresses the finite 
extensibility of each strand. Secondly, based on a continuous description of the problem , 
numerical simulations are presented to test some of the obtained power laws, in particular 
those near the unzipping transition. A special attention is paid to display the sequence- 
dependent features verses averaged, sequence-independent features. 

To start with, we consider two bounded DNA strands as shown schematically in Fig. 1, 
where some segments interact with each other predominantly with repulsions by displaying 
excursion loops, and some segments predominantly with attractions in a mostly bound form. 
To simplify the consideration, we assume here that the interaction energy per base pair when 
they are confined in the bounding distance can either be +e (repulsive) or — e (attractive). 
Furthermore, we assume that the averaged free energy per base pair is — |/o| when the 
dsDNA is below the melting temperature with no additional pulling force. 

When an external pulling force is applied to the two terminal nucleotides (or two terminal 
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groups) of the dsDNA, we assume that on average M base units can be pulled out. Hence, 
the reduced free energy of the entire chain reads 



where the first term describes the fact that there are N — M base pairs that still remain in 
the bound state on average. The three additional terms describe the effects of the pulling 
process that results in M separated base pairs and an average distance Z between the two 
terminals. One of the essential assumptions here is that the polymer segment in the stretched 
portion obeys a Gaussian distribution 0, an assumption that will be questioned below. In 
order to make a more precise comparison with a previous result, a pref actor D/2 is explicitly 
introduced in the second term, to make it exact even at the level of numerical coefficients. 
The third term is simply a potential energy reduction of the terminal pairs when an external 
pulling field is applied. 

The last term in Eq. ^ describes the binding energy reduction of the pulled portion. 
When the dsDNA is being pulled, two possibilities might occur near the end of the separated 
portion. In the event that this portion is connected to an originally bound segment, a 
stronger force is required to unzip this locked portion. In the event that this portion is 
connected to a segment that is an excursion loop, the dsDNA would spontaneously unwind 
the excursion loop. Hence under a fixed pulling force and distance Z, M may vary and the 
entire separated portion of the M base pairs would adjust itself so that it has a net sum of a 
lower energy. Since the binding energy is fluctuating between +y/Me and — v^Me randomly 
the average energy that the pulled portion can acquire is —y/Me, where the square root is 
a result of the randomness in the system . 

Now the free energy has to be minimized with respect to Z and M as adjustable param- 
eters. Minimizing with respect to Z leads to 



which has a linear form in F and reflects the fact that the pulled portion is near a highly 
stretched limit. Inserting Eq. ^ into Eq. |l], we have 




(1) 



Z = a^pFM/D 



(2) 
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= -N\fo\+tM -^/Me (3) 

where we have defined a reduced force parameter 

t=|/o|-/5FV/(2D). (4) 

The above equation is our final expression for the free energy of the system. We see 
that when t < 0, there is no finite minimum in the free energy as M tends to increase 
unboundedly; as a consequence the dsDNA is entirely unzipped. When t > however, there 
is an equilibrium state, where, by minimizing the above expression with respect to M, the 
average number of unzipped base pairs is 

M = {e/tr/A (5) 

and the free energy is 

-^=-iV|/o|-eVt/4. (6) 

The unzipping transition takes place when t = 0, which is equivalent to saying that there 
exists a critical unzipping force, 

= ^2D\f,\/{f3a'). (7) 

The main conclusion drawn here, namely, Eqs. 5, 6, and 7, is consistent with a much 
more involved derivation 0. The numerical coefficient of the critical force is exact, in view 
of the fact that the original involved terms in the free energy are exact to start with. The 
free energy for the entire system is no longer self-averaging, as the last term in Eq. ^ is 
not proportional to A^. Indeed, from the physical perspective, the equilibrium state of the 
pulled portion is established regardless of the total length of the dsDNA, as long as the 
pulled portion is still shorter. The independence of in the last term is merely a statement 
that for different A^, the equilibrium state of the pulled portion remains essentially the same. 

This simple treatment for the unzipping transition of the dsDNA produces several rela- 
tionships between the basic length scales and the transition properties. However, there is one 
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important feature of the stretched portion that has been ignored both above and in Ref . , 
that might become a pitfall of the entire derivation. The usage of a Gaussian entropic term 
for the stretched portion in Eq. 1 can be traced back to the spring-bead model commonly 
used in polymer modelling. As a spring can be stretched without a bound under a strong 
pulling force, the current model physically allows the pulled segment to stretch beyond the 
natural length Ma, as demonstrated in Eq. (2). Hence, it becomes desirable to incorporate 
the effect of finite extensibility in the above discussion. 

It has been previously demonstrated by Grosberg and Khokhlov |^ that a better rep- 
resentation for a stretched polymer is the freely jointed bond model that contains a rigid 
bond that models a polymer monomer unit. Instead of the second and thrid terms in Eq. 1, 
the contribution to the free energy due to the entropic penalty in the strong stretched limit 
[D = 3) can be shown to be 

_ M sinh(^ 

^stretch - jH J (8) 

and the separation distance can be directly calculated, 

Z = Ma[coth{f3Fa) - l/{f3Fa)]. (9) 
Replacing the the second and the third terms in Eq. 1, we have 

^ = -(A'-m)/„ + |m2^!^^1-^. (10) 

The linear dependence of the stretching free energy on M essentially renders the same scaling 
theory as discussed above. Indeed, the free energy expression in Eq. 6 remains the same 
and the only change is the definition of t, which now reads 

We note that this expression recovers the same form as in Eq. 4 in the small (3Fa limit. 

Hence, when the finite extensibility is taken into account in the dsDNA model, the critical 
force can be implicitly solved through finding the root for t = in Eq. 11. The power laws 
in Eqs. 5 and 6 remain the same. 



The rest of this article is devoted to the discussion of numerical simulations that can 
be used to address system specific features in comparison with these universal features. 
Returning to the Gaussian description of the conformation for both strands of DNA, we write 
a more rigorous expression for the probability functional in one dimension, after Lubensky 
and Nelson (LN) §: 

Po(xexp{-— / [{ f]ds-l3 W[z{s),s]ds} (12) 
/a^ JO as Jo 

where z{s) describes the distance between the base pair labeled s in the dsDNA, and W 
is a strong bounding potential, which depends on the (disordered) sequence of nucleotides 
on both strands. In a Gaussian chain model, the Kuhn length a in this reduced probability 
function is a/2 times the Kuhn length of each single strand of DNA ||10|. The unzipping 
probability is then simply 

Punzip[^(s), {s}] = Po[z{s), {s}]exp{(3F ■ z{N)} (13) 

with the assumption that the pulling force is directed along the z direction and acting on the 
A^th monomer. The {s} dependence in these functions stresses the fact that the interaction 
between the base pairs depends on a given sequence. 

A mathematically similar problem to the above probability distribution function arises 
in the study of random copolymer localization in an external field, specifically, at a sharp 
interface [^.The basic idea of our numerical simulations is to simulate the disordered hetero- 



geneousness in the system by generating an ensemble of dsDNA models that have explicit 
bounding interaction dependence. The calculation of the probability function of the LN 
model is then translated into solving a Schrodinger type equation. For each generated se- 
quence, the Schrodinger equation is solved and thus yields an exact result with no arbitrary 
approximations |p!2|. 



We have used basically the same numerical procedure described in Ref. [jT2[ by computing 
the probability function, iIj{z,s), of finding the sth base pair having a separation distance 
z. Now, the calculation of ipi^y ^ potential field can be simplified to solving a time- 



dependent Schrodinger equation, 

diplz, s) (f 



+ pWiz,smz,s) (14). 



ds ' 2 dz^ 

where the "wave function" ip is subject to the initial condition ipi^y 0) = 1. The proba- 
bihty function for the terminal pair to have a separation distance z, q{z, {s}), is given by 
q{z, {s}) = 4'{z,N). At this stage, q{z, {s}) has an explicit sequence dependence. 

For further numerical development we must choose a function form for the interaction 
potential and we write 

W[z,{s}]=as)ew{z) (15) 

with w{z) being modelled by a Gaussian potential well w{z) = — exp(— (2;/a)^)/-\/7r. The 
coefficient ({s) represents the binding energy of different pairs with e assumed here positive. 
In the remainder of this article we describe numerical results of having chosen (3e = 1. 

There are in total 10 possible pairs of nucleotides in DNA, corresponding to 10 different 
values of C{s), where guanine-thymine and adenine-cytosine pairings are known to be at- 
tractive. Whether or not the disordered sequence of bases along DNA is truly random is a 
still debatable question. However, for the present purpose, it suffices to assumed that ({s) 
takes a statistically independent random value for each s. In particular, we assume ({s) is a 
random number uniformly ranging from —1 to 1 as a theoretical abstraction. The numerical 



data for a biased randomness will be discussed elsewhere |I4 . 

Figure 2 shows the relationship between the force and the separated distance z for 10 
different, randomly generated disorder sequences. In producing the simulation data, a finite 
system with a limited size of 2; < 1600 was considered; this finite-size effect shows up in all 
curves as they cannot exceed a separation distance of 1600. A length of = 32000 units 
has been used here, which guarantees a minimum finite-size effect due to the modelled DNA 



length |T3|. The step-like functions in this plot are a signature of the sequence disorderliness. 
This figure is a clear indication that for a specific dsDNA model (and indeed for real dsDNA) 
a variety of force- distance curves can be obtained with the unzipping transition occur near an 
averaged critical force. The critical force for each sequence, however, varies. For reference. 



the anticipated critical force, averaged over the sequence ensemble, is around (3Fca = 0.3724 
from an accurate estimate of the bound free energy /o of this system (see Eq. 7) ||14|| . 

Though each sequence might show strong individual characteristics, the universal behav- 
ior around which the individual properties fluctuate, can be obtained by taking an average 
over the sequence ensemble. In order to do so, we have produced 10^ sample DNA models, 
each being examined through the solution of the Schrodinger equation. The average sepa- 
ration, < z >, as a function of the pulling force F, is presented in Fig. 2 as the thick dark 
curve, where the average < ... > represent sampling in the disorder ensemble. 

The same curve is also presented in Fig. 3 by circles as a function of Fc — F, in a double 
logarithmic scale. We see that within the region of [0.03, 0.02], < z > indeed follows a power 
behavior, 

<z>oc{F^-Fy\ (16) 

with a fitted slope of qi = 0.194, directly verifying the scaling exponent of 2 in Eq. 5. 
Exception of the scaling can be seen in smaller F^ — F region where the saturation oi < z > 
to a finite value occurs. As a direct consequence of the compound finite size effect shown 
for various sequences in Fig. 2, the averaged < ^ > is more drastically affected by the finite 
size of the system. 

LN have quoted two other power laws. Translating the averages of M into the averages 
of z permits us to examine these power laws: 

5zl =<z^> -<z >2oc (Fe - FY^ (17) 

and 

5zl =< z^> - < ^2 >oc (Fc - Fy^ (18) 

The square and diamond symbols in Fig. 3 represent our simulation data for dzl and dz"^. 
By fitting straight lines to these curves in the region [0.03, 0.02], we obtained q2 = 3.99 and 
gs = 2.94, which agree well with analytical determination of these exponents [Q. 

To summarize, we have carried out a scaling analysis of the unzipping transition phe- 
nomenon of a heterogeneous dsDNA model. Our main conclusions, on the free energy and 
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stretching separation, agree with the results drawn by previous authors. These results com- 
pare favorably well with the numerical simulation data, which, for the first time, demonstrate 
the relationship between the sequence-dependent and universal features. 

Financial support of this work is provided by the Natural Science and Engineering Re- 
search Council of Canada and the grant associated with the Premier's Research Excellence 
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FIGURES 

FIG. 1. Schematic representation of the dsDNA model considered. Each pair interacts with 

either e or — e, while the sequence is assumed heterogeneously disordered. 

FIG. 2. Typical unzipping force verses extension curves for 10 different random sequences. The 
thick curve represents an average over 10^ independently generated samples. 

FIG. 3. Scaling behavior of the unzipped extension as a function of the force difference Fc — F 
(circles). The other two power laws, Eqs. (17) and (18), are examined in terms of the numerical 
data corresponding to squares and diamonds. 
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